Microsoft COCO: Common Objects in Context
نویسندگان
چکیده
We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.
منابع مشابه
ChatPainter: Improving Text to Image Generation using Dialogue
Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can contain several objects, is a challenging task. Prior work has used text captions to generate images. However, captions might not be informative enough to capture the entire image and insufficient for the model to be able to understand which objects in the i...
متن کاملOracle MCG: A first peek into COCO Detection Challenges
Microsoft COCO [2] is a new annotated database in computer vision consisting of more than 200.000 images. There are currently more than one million annotated objects from 80 categories, with fully segmented masks. With respect to Pascal [1], the previous available dataset with semantic segmentation annotations, COCO has four times the number of categories and two orders of magnitude more images...
متن کاملFine-tuning deep CNN models on specific MS COCO categories
Fine-tuning of a deep convolutional neural network (CNN) is oen desired. is paper provides an overview of our publicly available py-faster-rcnn- soware library that can be used to ne-tune the VGG CNN M 1024 model on custom subsets of the Microso Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image le...
متن کاملRONCHI AND PERONA: DESCRIBING COMMON HUMAN VISUAL ACTIONS IN IMAGES 1 Describing Common Human Visual Actions in Images
Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common ‘visual actions’, obtained by analyz...
متن کاملDescribing Common Human Visual Actions in Images
Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common ‘visual actions’, obtained by analyz...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014